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Abstract 

Second-order coding rate of channel coding is discussed for general sequence of channels. The optimum second-order 
transmission rate with a constant error constraint e is obtained by using the information spectrum method. We apply this result to 
the discrete memoryless case, the discrete memoryless case with a cost constraint, the additive Markovian case, and the Gaussian 
00 . channel case with an energy constraint. We also clarify that the Gallager bound does not give the optimum evaluation in the 

' second-order coding rate. 
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CN ' I. Introduction 

(N 

~ ~ ASED on the channel coding theorem, there exists a sequence of codes for the given channel W such that the average 
error probability goes to when the transmission rate R is less than C^,^. That is, if the number n of applications of the 
channel W is sufficiently large, the average error probability of a good code goes to 0. In order to evaluate the average error 
probability with finite n, we often use the exponential rate of decrease, which depends on the transmission rate R. However, 
Q such an exponential evaluation ignores the constant factor. Therefore, it is not clear whether exponential evaluation provides a 
good evaluation for the average eiTor probability when the transmission rate R is close to the capacity. In fact, many researchers 
believe that, out of the known evaluations, the Gallager bound [1] gives the best upper bound of average error probability in 
. the channel coding when the transmission rate is greater than the critical rate. This is because the Gallager bound provides 
(~\| ' the optimal exponential rate of decrease. In order to clarify this point, we focus on the second-order coding rate in channel 
coding, in which, we describe the transmission length by C^n + i?2\/n- From a practical viewpoint, when the coding length 
is close to C^^n, the second-order coding rate gives a better evaluation of average eiTor probability than the first-order coding 
^ ; rate. In fact, the second error coding rate has been applied for evaluation of the average error probability of random coding 



concerning the phase basis, which is essential to the security of quantum key distribution [2]. Therefore, it is appropriate to 
treat the second-order coding rate from the applied viewpoint as well as the theoretical viewpoint. In the case of the discrete 
memoryless case, Strassen [3] derived the optimum rate i?2 for an arbitrary average error probability < e < 1 using the 
Gaussian distribution. In the present paper, we extend his result to more general cases, i.e., the discrete memoryless case with 
. ^ '^ost constraint, the Gaussian additive noise case with the energy constraint, and the additive Markovian case. Further, our proof 
for the discrete memoryless case is much simpler than the original one. Indeed, since his proof is not so simple and his paper 
is written in German, it is quite difficult to follow his proof. 

In the present paper, in order to treat this problem from a unified viewpoint, we employ the method of information spectrum, 
which was initiated by Han-Verdii [4], and was mainly formulated by Han[5]. The second-order coding rate is closely related 
to the method of information spectrum because Hayashi [6] treated this problem of fixed-length source coding and intrinsic 
randomness using the method of information spectrum. Hayashi[6] discussed the error probability when the compressed size 
is H{P)n + ay/n, where n is the size of input system and H{P) is the entropy of the distribution P of the input system. 
In the method of information spectrum, we treat the general asymptotic formula, which gives the relationship between the 
asymptotic optimal performance and the normalized logarithm of the likelihood of the probability distribution. In order to 
treat a special case, we apply the general asymptotic formula to the respective information source and calculate the asymptotic 
stochastic behavior of the normalized logarithm of the likelihood. That is, in the information spectrum method, we have two 
steps, deriving the general asymptotic formula and applying the general asymptotic formula. With respect to fixed-length 
source coding and intrinsic randomness, the same relation holds concerning the general asymptotic formula in the second-order 
coding rate. However, there is a difference concerning the application of the general asymptotic formula to the independent 
and identical distributions. That is, while the normalized logarithm of the likelihood approaches the entropy H{P) in the 
probability in the first-order coding rate, the stochastic behavior is asymptotically described by the Gaussian distribution in the 
first-order coding rate. In other words, in the second step, the first-order coding rate corresponds to the law of large numbers, 
and the second-order coding rate corresponds to the central limit theorem. 
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In the present paper, we treat the channel coding in the second-order coding rate, i.e., the case in which the transmission length 
is C^^n + a^pn. Similar to the above-mentioned case, we employ the method of information spectrum. That is, we treat the 
general channel, which is the general sequence {W^"(?/|2:)} of probability distributions without structure. As shown by Verdu- 
Han [14], this method enables us to characterize the asymptotic performance with only the random variable ^ log 
(the normalized logarithm of the likelihood ratio between the conditional distribution and the non-conditional distribution) 
without any further assumption, where Wpni^y) ^ Concerning this general asymptotic formula, if we 

can suitably formulate theorems in the second-order coding rate and establish an appropriate relationship between the first-order 
coding rate and the second-order coding rate, we can easily extend proofs concerning the first-order coding rate to those of 
the second-order coding rate. Therefore, there is no serious difficulty in establishing the general asymptotic formula in the 
second-order coding rate. In order to clarify this point, we present proofs of some relevant theorems in the first-order coding 
rate, even though they are known. 

In order to treat the special cases, it is sufficient to apply the general asymptotic formula, i.e., to calculate the asymptotic 
behavior of the random variable i log ■ The additive Markovian case can be treated in the same way as fixed-length 

source coding and intrinsic randomness. However, other special cases have another difficulties, which do not appear in fixed- 
length source coding or intrinsic randomness. The first difficulty is the optimization concerning the input distribution in the 
converse part of the channel coding. This problem commonly appears among the three cases, i.e., the discrete memoryless 
case, the discrete memoryless case with cost constraint, and the Gaussian additive noise case with the energy constraint. In the 
discrete memoryless case, the second-order coding rate coiTesponds to simple application of the central limit theorem, while 
the first-order coding rate coiTesponds to the law of large numbers. Hence, the performance in second-order coding rate is 
characterized by the variance of the logarithmic likelihood ratio, and the direct part can be easily obtained in this case. This 
relationship is summarized in Fig. |5] 
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Fig. 1. Relationship between the present result and fixed-length source coding/intrinsic randomness. The — > arrow describes the direct part, and the ^ aiTow 
describes the converse part. 

However, there is another difficulty in the direct part for the discrete memoryless case with cost constraint and the Gaussian 
additive noise case with the energy constraint. In these cases, all of the encoded signals has to satisfy cost constraint. This 
kind of difficulty does not appear in the case of first-order coding rate of both of the discrete memoryless case with cost 
constraint and the Gaussian additive noise case with the energy constraint. This is because it is sufficient to construct the code 
whose average error probability goes to zero in the case of the first-order coding rate while it is required to construct the code 
whose average error probability goes to a given thereshold e in the case of the second-order coding rate. When we find a code 
satisfying the following; its average error probability goes to zero and its average cost is less than the constraint. Then, there 
exists a subcode satisfying the following; its average eiTor probability goes to zero and the costs of all encoded signals are less 
than the constraint. However, the same method cannot be applied when we find a code satisfying the following; its average 
eiTor probability goes to e and its average cost is less than the constraint. In the present paper, we directly construct a code, 
in which the costs of all encoded signals are less than the constraint. 

Here, we describe the meaning of the second-order coding rate. When the transmission length is described by nC'^ ^ \fnR2, 
as shown in Subsection IIX-AI the optimal error can be approximately attained by random coding. Since it seems that random 
coding cannot be realized, our evaluation seems to be related to only the theoretical best performance. However, in the quantum 
key distribution, it can be realized concerning the phase bases [7], [2]. In such a setting, the coding length is on the order of 
10,000 or 100,000[8]. In the quantum key distribution, Hayashi [2] has applied the second-order coding rate to evaluate the 
phase error probability, which is directly linked to the security of the final key. 

The remainder of the present paper is organized as follows. In Section [III we revisit the second order coding rate in the 
stationary discrete memoryless case, and dicuss the second order coding rate in the stationary discrete memoryless case with 
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cost constraint. In Section [Till the Markovian additive channel is treated. In Section |IV| the Gaussian additive noise case 
with the energy constraint is considered. These results are shown in the Section |X] by employing the method of information 
specturm. In the present result, the performance of information transmission is discussed in terms of second-order coding rate 
using two important quantities and Vy^ instead of the capacity in the case of discrete memoryless case. In other cases, 
similar quantities play the same role. 

In Section [V] we compare our evaluation with the Gallager bound [1] in the second-order setting. In Section IVII the 
properties of Vy^ and are discussed. In Subsection IVI-AI we discuss a typical example such that Vy^ is different from 
Vyy-. In Subsection IVI-BI the additivities concerning V^- and Vyy are proved. In Section [VTIl the notations of the information 
spectrum are explained. In Section IVIIII the performance of the information transmission is discussed in terms of the second- 
order coding rate using the information spectrum in the general case. That is, we present general formulas for the second-order 
coding rate. In Section HXl the theorem presented in the previous section is proved. In Section |X] using general formulas for 
the second-order coding rate, we demonstrate our proof of the second order coding rate in the stationary discrete memoryless 
case using our general result concerning the second order coding rate. In this proof, the direct part is immediate. The converse 
part is the most difficult considered herein because we must treat the information spectrum for the general input distributions 
in the sense of the second-order coding rate. 



II. Second order coding rate in stationary discrete memoryless channels 



As the most typical case, we revisit the second-order coding rate of stationary discrete memoryless channels, in which, 
we use an n-multiple application of the discrete channel W{y\x), which transmits the information from the input system 
X to the output system y. That is, the channel considered here is given as the stationary discrete memoryless channel 
W^"'{y\x) nr=i ^iUil^i)- Note that, in the present paper, P x P' (W x W') denotes the product of two distributions P 
and P' (two channels W and W), and P^" (W^") denotes the product of n uses of the distribution P (the channel W), i.e., 
the n-th independent and identical distribution (i.i.d.) of P (the n-th stationary memoryless channel of W). In this case, when 
the transmission rate is less than the capacity C^^, the average error probability goes to exponentially, if we use a suitable 
encoder and the maximum likelihood decoder. 

Let N be the size of the transmitted information. The encoder is a map (f> from {!,.... N} to A"", and the decoder is given 
by the set of subsets {X'ijfli of 3^"' where Vi corresponds to the decoding region of i e {!,..., N}. Then, the code is given 
by the triple {N, cj), and is denoted by $. The average error probabihty Pe.H'X"!*^') is described as 



e,W 



dof 



1 



Ed 



where Wx{y) PF(?/|a;). For simplicity, the size Nn is denoted by |<I>|. The performance of the code $ is given by the pair 
of Pei^) and |$|. As stated by the channel coding theorem [9], the capacity is given by 



= ma.xI{P,W) = minmaxI?(VF^||g), 
P Q X 



where Q is the output distribution, and 



X 



DiP\\P')'^'J2Pi^)^^S^ 



P{x) 



P'{x) 



Thus, Q 



dcf 



M 



argmiug max:r /^(WxllQ) satisfies 



(1) 



Throughout the present paper, we choose the base of the logarithm to be e. 

Although the above channel coding theorem concerns only the first-order coding rate of the transmission length log A^„, our 
main focus is the analysis of the second-order coding rate. When the transmission length log Nn asymptotically behaves as 
nC^J^ + a^fn, the optimal average error is given as follows: 



dot 



inf 

{i'"},T=i 



lim sup Pf 



lim inf (log 1$^ 



I - nC^) > a 



(2) 
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Fixing the average error probability, we obtain the following quantity: 



CD^(e, CrW) = sup \ liminf ^(log |<i>„| - nC^^) 



~(DM| 



dcf 



{1'"},T=i 



1 



limsupFe^vi/x"(^n) < e 



(3) 



We refer to this value the optimum second-order transmission rate with the error probability e. In order to treat the second-order 
coding rate, we need the distribution function G for the standard Gaussian distribution (with expectation and variance 1), 
which is defined by 

1 



Gix] 



def 



/2n 



In this problem, the quantity Vp^w- 

i^p,w ='E^(^)E^-(j^) (log 



Wp{y) 



D{W.,.\\Wp) 



plays an important role. By using these quantities, C^^{a,C^!^\W) and C^^{e,C^!^\W) are calculated in the stationary 
discrete memoryless case as follows 



dcf 



Theorem 1: (Strassen[3]) When the cardinality \X\ is finite and Pm = argmaxp /(P, ly) exists uniquely, then 



C°^(«, C^'^W) = Gia/^V^) (4) 

C''^{e,CT\W) = ^JV^G-\e). (5) 
When {Wic} is linearly independent by regarding distributions as positive vectors, the map P i— > Wp is a one-to-one map. 

Then, Pm argmaxp /(P, W) exists uniquely. However, when {W^^} is not linearly independent, argmaxp /(P, W) is not 
necessarily unique. In order to treat such a case, we introduce two quantities and and two distributions Pm+ and 

Pm-- 



dof 



dof 



Pm+ = argmaxVpn^ 
Pev 

Pm- argmmVp^vy, 
Pev 



where V =^ {P|/(P, W") = G^}. In order to treat such a case. Theorem [T] is generalized as follows: 

Theorem 2: (Strassen[3]) When the cardinality \X\ is finite and the set V has multiple elements, (|4]l and Q are generalized 

as 



More precisely, the direct part 



C^^{a,GT\W)< 



G''^\e,G^\W) > 
hold without any assumption, and the converse part 

C^^{a,GT\W)> 



G{a/^V+) a>0 
'V+G-\e) e>l/2 



y^^G-i(e) e<l/2. 



\A^G-i(e) e>l/2 
^G-i(e) e<l/2. 



(6) 



(7) 



G(a/^1/+) a>0 
G(a/yi^) a<0 

y^G-i(e) e>l/2 
^G-i(e) e<l/2. 
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hold with the assumption \X\ < oo. 

Next, consider the cost function c : A" > R. In this case, we often assume that all encoded alphabets of the code 
belongs to the set 



The maximum coding rate with the above condition is called the capacity with the cost constraint, and is given by [10] 

Cw-f X = max I(P,W)=min max J(P,Q,W), 

' ' P:Epc{x)<K Q P:Epc{x)<K 

where 

J(P,Q,iy)1^' ^P(x)i?(T4^,||Q). 

xex 

In the same way to ^ and (O, we define the following values with the cost constraint: 



dcf 



C^^^Ha,Ci^'^W,c,if) = inf limsupPe,vvx.($„) 

{*"}^=l I n^oo 



lim inf (log |$„ I - > a, supp($„) C X^j, } ■ (8) 



dcf 



C^^^^ (e, C{^^^^ I W, c, K) = sup lim inf — (log | | - nC^^^^^ ) 



1 



limsupPe,vi/x"(*n) < e,supp($„) C A'JV h (9) 

where supp($„) expresses the set {(f>{l), ■ ■ ■ , 4>{N)} for a code $ = (A^, 0, {X'ijfli). We introduce two quantities V^^^ 
and Vj^cK ^'^'^ '^wo distributions Pm+,c,k and Pm-,c,k- 



del 

dot . 

Pm+,c.k =^ argmaxVp^w 
Pm-,c.k =^ argmin Vp^vK, 



dcf 



where V,,/f {P|/(P, W) = C™^^^,Epc(x) < if}. 
Theorem 3: When the cardinality jA"! is finite 



C^'''\a,C^l^\W,c,K) 



More precisely, the direct part 



< 



hold without any assumption, and the converse part 



^^G-i(e) e>l/2 
Vw,c,KG-\e) e<l/2. 



y^^G-i(e) 6>l/2 
^w,c,KG-i(e) 6<l/2. 



(10) 



(11) 



l^G-i(e) e>l/2 
Vw.cxG-He) e<l/2. 
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hold with the assumption \X\ < oo. 

Remark 1: When the sets X and 3^ are given as general probability spaces with general cr-fields cr(A'„) and (j(3^„), the 
above formulation can be extended with the following definition. The channel W is given by the real-valued function from X 
and <T{y) satisfying the following conditions; (i) For any x ^ X, is a probability measure on 3^, (ii) For any F G cr(3^), 
WXF) is a measurable function on X. P take values in probability measures on X. Then, the summands J2xex ^i-'-) '^^'^ 
^y(zyWx{y) are replaced by P{dx) and JyWxidy), respectively. For any distribution Q on y, the function ^^^^^j is 
replaced by the inverse of Radon-Nikodym derivative ^riu) of with respect to W^- In this extension, the direct part 
©, ©, GB, and are vaHd. 

III. Second order coding rate in additive Markovian channel 
Next, we we focus on the additive Markovian channel, in which, we assume that the additive noise obeys the transition 
matrix Q{y\x) on the set X = {1, . . . ,d}. Then, the channel W{Q)"{y\x) has the form HiLi QiUi ~ — Xi-i), where 

yo — xo is the initial state sq and the arithmetic is based on mod d. For simplicity, we assume that the transition matrix 
Q{y\x) is irreducible. Then, the n-th marginal distribution Q"(x„) := J^i^ i IliLi ^(^il^i-i) approaches the stationary 
distribution Pq{x), which is given as the eigenvector of Q{y\x) associated with the eigenvalue 1[12]. When the conditional 
distribution Q{y\x) is denoted by Qx{y), the normalized entropy of the distribution Q"{xn) ■— TVi^iQi^il^i-i) SO^^ to 
H{Q) := J2xPQix)H(.Qx)- Then, by defining the capacity C^^ in the same way as C^^, the channel capacity C^^^ is 
calculated as 

C^^ = \ogd-HiQ). (12) 

Similar to C^^{a,C^^\W) and C^^ie,C^\W), the second order quantities C^^{a,C^^\W) and C^'^ie,C^^\W) are 
defined for the additive Markovian case. Then, the following theorem holds. In this problem, the variance V{Q): 

V{Q) 

:=^Q(y|x)FQ(a;)(-logQ(y|x)-i/(Q))2 

+ 2 ^ Q{z\y)Q{y\x)PQ{x){-\ogQ{z\y) - H {Q)){~ log Q{y\x) - H{Q)). 

z,y,x 

plays an important role. By using these quantities, Cp^{a,CQ^\W) and C^^(e, Cg'^jM^) are calculated in the additive 
Markovian case as follows 
Theorem 4: The relations 

C^^{a,C^^\W)^G{a/,/vW)) 

hold. 



IV. Second order coding rate in Gaussian channel 
In this section, we consider the case of additive Gaussian noise. In this case, both of the input system and the output 
system are given by R, and the output distiibution Wx{y) is given by ■^y=^e ^iv^ for a given noise level N. If there is no 
restriction for input signal, the capacity diverges. Hence, it is natural to consider the cost constraint. Consider the cost function 
c{x) =' x"^ and the maximum cost S. Then, the maximum mutual information maxp.^^x'^^^g I{P,W) is attained when P is 
equal to Pm{x) ^2ttS ^~^ ' 

x^ S 

D{Wx\\WpJ = I log(l + |) + (13) 



Then, the capacity is known to be [9], [11] 



. ilog(l + — ) 



C^,s = _niax_/(P,T4^) = -log(l 



Since 



iQg w /\ - DiWxWWp,,) Wxiy)dy = 



WpAv) — 2(1 + 1)2' 
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^Pm,w is calculated as 

Since the cardinality of M is infinite, the assumption of section HI] does not hold. That is, we cannot apply Theorem[3] However, 
the following theorem holds. 

Theorem 5: Define the quantities C^(a, C§ g\N, S) and C"^(e, sl^' ^) i" "^^e same way as (O and (|9|. Then, 

Cp«(a, C^^f^liV, S) = Gia/^V;;^) 
C'^ie^C^jN, S) = ^/V^G-\e). 

V. Comparison with the Gallager bound 

At first glance, the Gallager bound [1] seems to work well for evaluating the average error probability, even when the 
transmission length is close to nC^^. This is because this bound gives the optimal exponential rate when the coding rate is 
greater than the critical rate. In this section, we clarify whether the present evaluation or the Gallager bound [1] provides a 
better evaluation when the transmission length is close to nC^^. For this analysis, we describe the transmission length by 
nC^^ + ^/nR2■ Let us compare the present evaluation with the Gallager bound, which is given by 

min Pe wx4$) < min min e"(^''+'''''(''», (14) 

$:|$|<e"« ' P 0<S<1 

where 

i;p{s) log^ ('^P(x)W^,(2;)T^'j . 

y \ X / 

Since the present evaluation is essentially based on Verdu-Han's method[14], this comparison can be regarded as a comparison 

^n{Rs+4!p(s)) _ nmino<e<i(C™s+^s+Vj>(s)) 



between Verdu-Han's evaluation and the Gallager bound. Next, we substitute nC^J^ + \/nR2 into nR. Then, 



0<s<l 



Taking the derivatives of ipp{s), we obtain 



dipp{s) 



ds 
d^i>p{s) 



ds^ 



= -I{P, W) 

— yp,w- 

s=0 



When = I{P, W), 



CTs + ^s + Ms) - CTs + I{P, W)s + 

R2 , Vp,w 2 Vp w / , s2 ^2 

= sH — s = — - — (s + ^ 



y/n 2 2 \/nVp^w 2nVp^w 

Therefore, as is rigorously shown in Appendix, when P2 < 0, 

lim n min (c^^s + + ijp{s) \ - (15) 

n^oo 0<s<l \ ^ J 2Vp^W 

Next, we set P as Pm-- Then, the Gallager bound yields 

Cf^{R2,C^'^\W)<e"^ (16) 
for any R2 < 0. That is, the gap between our evaluation and the Gallager bound is equal to the difference between F{—^=) = 



I'^dx and e ^'V . Although the former is smaller than the latter, both exponential rates coincide in the limit 
P2 00. Since we can consider that the Gallager bound gives the trivial bound for R2 > 0, both evaluations are illustrated 
in Fig. |2] 

Next, we consider the same comparison for the additive Markovian case. The Gallager bound is given by 

min Pe vyfov- (*) < min min e"(^"+'^«."('*», 
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Fig. 2. Comparison between the present evaluation and tlie Gallager bound. The solid line indicates the Gallager bound, and the dotted line indicates the 
present evaluation. 



where 



V'Q.n(s) Sl0grf+ ^-!^log(V 0"(x„)i+0. 

71 ' ^ 



Since the asymptotic first and second cummulants of the random variable \o%Q^(xn) are —H{Q) and V{Q), we have 
as t ^ 0. Thus, 



Substituting nCw + \pn,R2 and ^ into nR and s, we have 



AM 



--R2t + 
V{Q) 



i?2 



t 



i?2 t 



V" V 

^(Q) 



+ ^Q.n(-^)) 



V{Q)' 2V{Qy 



Therefore, when Ro < 0, choosing s — T7T7^h=, we obtain 



$:|$|<e Q 

which has the same form as ( fT6] l. 

In both cases, when —3 < R2 < 2, the difference is not so small. In such a case, it is better to use the present evaluation. 
That is, the Gallager bound does not give the best evaluation in this case. This conclusion is opposite to the exponential 
evaluation when the rate is greater than the critical rate. Han [5] calculated the exponential rate of the present bound, and 
found that it is worse than that of the Gallager bouncQ. 

Moreover, a similar conclusion was obtained in the LDPC case. Kabashima and Saad [13] compared the Gallager upper 
bound of the average error probability and the approximation of the average error probability by the replica method. That 
is, they compared both thresholds of the rate, i.e., both maximum transmission rates at which the respective error probability 
goes to zero. In their study (Table 1 of [13]), they pointed out that there exists a non-negligible difference between these two 
thresholds in the LDPC case. This information may be helpful for discussing the performance of the Gallager bound. 



'This description was provided in the original Japanese version, but not in the English translation. 
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VI. Properties of and 

A. Example 

In this section, we consider a typical example, in which, Vy^, is different than Vyy. For this purpose, we choose two parameters 

gi, 92 e [0, 1] satisfying 

< 2gi - 92 < 1 

^ < -logmax{gi,l - (17) 

where h{x) —a; log a; — {1 — x) log(l — x). According to the following three conditions (i), (ii) and (iii), we define the five 
joint distributions Wi, W2, W^, W4, and W5 on two random variables ^ = 0, 1 and B = 0,1. In the following, Q'^ (Q^) 
denotes the marginal distribution of A concerning A (B). 

(i) Uniformity on A 

All distributions are assumed to satisfy 



WfiO) - 1/2. 

(ii) Same marginal distribution on B for i = 1,2 

Two random variables A — 0,1 and i? = 0, 1 are not independent in Wi and W2, but Wi and W2 have the same 
marginal distribution on B. That is, 

W^{0\A = 0) = Wi{0\A = 1) = (72 
Wi^{0\A = 1) = Wi{0\A 0) = 2gi - 92- 

Thus, W^i and W2 satisfy 

Wi'{0)^Wi'{0)=qi. 

(iii) Independence between A and i? for i = 3, 4, 5 

Due to the condition (fTTl i. there exist two solutions for x in the following equation because d(a;||qi) is monotone 
increasing in {qi, 1) and is monotone decreasing in {0,qi): 

^(91) = d{x\\qi). 



where 



d{x\\y)'^^ x\og— + [1 — x)\o^^ ""^ 



y 1 - y 

Letting pi and p2 be these two solutions, we define three distributions W3, W4, and W5, in which two random 
variables A = 0,1 and B = 0,1 are independent, by 

WiiO)=pi, Wfi0)=p2, M^f(0) = gi. 

From the construction, we can check that 

DiW.\m^hiq,)-l^M±lpl^ (18) 

for i = 1,2,3,4. Consider the subsets 

Zo = {Q\Q^{0) = 1/2} 

Zitif{QeZo|g^(0) = <zi} 

Z2 {Q e Zo|g^(0|A = 0) = Q^(0|A = 1)}. 

Then, Zi O Z2 — {W5}. Hence, the relationship among Zq, Z\, Z2, W\, W2, W3, W4, and W5 is shown in Fig. [3] For any 
distribution Q, 

Then, the following lemma holds. 

Lemma 1: 

argmax min D{Wx\\Q) — argmax min D{Wx\\Q) (19) 
argmax min D{Wx\\Q) — argmax min D{Wx\\Q)- (20) 

Q ^=3,4 Qg22 a;=3,4 



Fig. 3. 2o, -Zi, 2-2, Wi, W2, W'i, W4,, and W5 



Therefore, (fTSl l implies that 



argmax min D{Wx\\Q) = W5. 

r) a; — 1,2,3,4 



and 



max min D{Wx\\Q) = max min D{Wx\\Q) — h{qi) — 

Q x=l,2 QeZi x=l,2 

max min D{Wx\\Q) = max min D{Wx\\Q) — h{qi) — 

Q x=3,4 Q^Zi a;=3,4 



h{q2) + fe(2gi - 92) 
2 

h{q2) + h{2qi - ga) 



That is, the capacity of the channel x — 1, 2, 3, 4 Wx is calculated as 

Cj^^=max min D{Wx\\Q) ^ h{q^) - 

Q x=l,2,3,4 



h{q2) + h{2qi - q2) 



Then, the set V is given by the convex hull of P = (1/2, 1/2, 0, 0) and P' = (0, 0, f^)- Thus, Vap+(i_a)P' 

Ayp,iv + (1 — A)Vp'^vF- When Vp^w < Vp'^w, 



Otherwise, 



Our numerical analysis (Fig. |4|l suggests the relation Vp,w < Vpi 



w- 




0.1 0.2 0.3 0.4 0.5 

qi (q2=o.i) 



0.1 0.2 0.3 0.4 0.5 

qi (q2=0.1) 



Fig. 4. Comparison between Vi = Vp,w (dotted line) and V2 = Vpi w (solid line). 



Proof of Lemma\J} For this proof, we define the maps £a and £3 as 

(f = a,B = b) ■.=P'^{a)Q{B = 6|yl = a) 
(£i3g)(A = a, S = 6) :=P^(6)g(A = a|S = fo), 
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where P^{0) = 1/2 and P^{0) = qi. when the distribution Q' satisfies that Q' — P^, the following Pythagorean type 
inequality 

D{Q'\\Q) = D{Q'\\£a{Q)) + D{£a{Q)\\Q) (21) 
holds. Similarly, when the distribution Q' satisfies that Q'^ ~ P^, the following Pythagorean type inequaUty 

D{Q'\\Q) = D{Q'\\£b{Q)) + D{£bQ\\Q) (22) 
holds. Define Q2k ■= £b ° £a o ■ ■ ■ o £b ° £a Q and Q2k+i ■= £a ° £b ° £a ° ■ ■ ■ ° £b ° Sa Q- Then, D{Q2k+i\\Q2k) = 

2k 2fe+l 

D{£AQ2k\\^AQ2k-i) < D{Q2k\\Q2k-i), and D{Q2k\\Q2k-i) < D{Q2k-i\\Q2k-2)- For any Q' e Zi, we have 



DiQ'WQ) = DiQ'WQn) + DiQkWQk-i). 

fe=i 

Thus, D{Qk\\Qk-i) converges to zero. Therefore, there exists a distribution Qoo such that Qk Qoo- Hence, 

oo 

DiQ'WQ) = ^(Q'llQoo) +Y,D{Qk\\Qk-i), 

k=l 

which implies (fT9] l. 

Further, for any P2 G Z2, we assume that Q satisfies Q^ — P^. Since the concavity of log implies the inequality 
logX^a P^{a)Q{B = h\A ^a) >J2a P^{cl) ^ogQ{B = b\A = a), the following Pythagorean type inequahty 

D{P2\\Q) - H{P2) - P2^(a)P2^ (6) log Q(a, 6) 

a b 

=H{P2) - ^ P2^{a) logQ^(a) -^.T. ^2^(«)A'' W ^^^Qi^ - b\A = a) 

a a b 

=H{P2) - J2 P2{a) logQ^(a) ~Y.P?^^) logQ^(&) + E P?^^) ^ogQ^ib) ~Y.11 P2{c^)Pi{b) logQ(i? = b\A = a) 

a b b a b 

^D{P2\\P^ X Pi) + Y Piib) logQ^(6) - Y Pi(b) Y ^ogQ{B = b\A = a) 

b b a 

^D{P2\\P^ X Pi)+YP?i^) [\ogYP^{a)Q{B - b\A - a) - ^P^(a)logQ(i? = b\A = a) J 

>D{P2\\P^ X Pi) (23) 
holds. Combination of ( |22] | and ( l23T l yields (|20] |. 

■ 

5. Additivity 

The capacity satisfies the additivity condition. That is, for any two channels {Wa;(?/)} and {W^, (y')}, the combined channel 
{(1^ X W^')x,x'(2/,y') = W^.(y)W^^'(2/')} satisfies the following: 

'^WxW — "T '^W ■ 

Similarly, as mentioned in the following lemma, and satisfy the additivity condition. 
Lemma 2: The equations 

yw.w' = yw + yw' (24) 
yw^w' = yw + yw' (25) 

hold. 

Proof of Lemma |2} We choose the distributions Q and Q' as 

Q =^ argmin max _D(Wa; HQ) 

Q ^ 

Q' =^ argminmaxi:>(W^,||Q')- 

Q, X' 

Then, 

QxQ' ^ argniinmaxi:>(VKr x W^,\\Q"). 
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Assume that a distribution P with the random variables x and x' satisfies the following: 

J2 P{x, x)W, xW',.^Qx Q\ (26) 

x.x' 

/(F, WxW')^ Cj^M + C^^. (27) 
Then, the marginal distributions Pi and Pi of P concerning x and x' satisfy 

HPuW) = c°^, /(P2, ly') = c°¥, 

which implies 

D{W^\\Q) = C°^, Diw:,,\m - 
for x e supp(Pi) and a;' £ supp(P2), where supp(P) denotes the support of the distribution P. Hence, 



v,y 



y,v 

{D{W4Qf + D{W'AQ'f + '2D{W4Q)D{W'AQ')) 



x,x' y,y' ^ Vi* / / 

Therefore, when the conditions (|26] | and ( l27l ) are satisfied, the maximum of Vp^wycW is equal to V^^ + V^,, which implies 
(|24] |. Similarly, we obtain (l25]l. ■ 

The same fact holds with the cost constraint. The capacity with the cost constraint satisfies the additivity condition. That is, 
for any two cost fucntions c and c' for channels {W^a;(y)} and {W^^/(y')}, the combined cost (c + c'){x, x') c{x) + c'(x') 
satisfies the following: 

'-W-X.W' ,c+c',K+K' — '^W,c,K + '^W',c',K'- 

The quantities ^ and ^ ^ satisfy the additivity condition. 
Lemma 3: The equations 

^WxW',c+c',K+K' = ^W,c,K + ^W',c',K' (28) 
^WxW ,c+c',K+K' — ^W.c.K + ^W'.c'.K' (29) 

hold. 

This lemma can be proven in the same way as Lemma |2] by replacing the definitions of Q and Q' by 

Q argmin max P(x)D(Wx\\Q) 

Q P:Epc{x)<K^ 

Q' 1^*' argmin max ^ HQ')- 
VII. Notations of the information spectrum 

A. Information Spectrum 

In the present paper, we treat general channels. First, we focus on two sequences of probability spaces {Xn]'^^i of the 

input signal and those {3^„}^i of the output signal, and a sequence of probability transition matrixes W {W"'{y\x)\'^^i- 

We also focus on a sequence of distributions on input systems P {P"}J5Li- The asymptotic behavior of the logarithmic 

likelihood ratio between = W'^{y\x) and W^„{v) " J2xex^ P"(x)M^"(?/|a;) can be characterized by the following 

quantities 

I,iR\P,W) -^^limsup E^P"(x)I4^." {^°^^ < 4 

I{e\P, W) = snp{R\Ip{R\P, W) < e} 
= M{R\Ip{R\P,W) > e} 
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for < e < 1. Focusing on a sequence of distributions on output systems Q =^ we can define 



Jp{R\P, Q, W) limsup ^ P"{x)W^ ( - log < ^1 

J(e|P, Q, =^ sup{i?| Jp(i?|P, g, VT) < e} 
inf{ii|Jp(i?|P,Q,-W^) > e} 



for < e < 1. 

When the channel W"^ is the n-th stationary discrete memoryless channel W^'" of W^(y|a;) and the probability distribution 
P = {P"} is the n-th independent and identical distribution P^" of P, the law of large numbers guarantees that I{e\P, W) 
coincides with the mutual information I{P, W) ~ ^ P(x)Wa;(j/) log -^^^^ For a more detailed description of asymptotic 
behavior, we focus on the second order of the coding length for /3 < 1. In order to characterize the coefficient of the second 
order , we introduce the following quantities: 

/p(P2, Pi|P, W) = limsup ^"(^)W^." (^(log Wrr\ - ^^i) < ^2 

7(e, Pi|P, W) = sup{P2|/p(P2, Pi|P, W) < e} 
= ini{R2\Ip{R2,Ri\P,W)>e} 

for < e < 1. Similarly, Jp{R2, Ri\P,Q,W) and J{e,Ri\P,Q,W) are defined for < e < 1. When W is = 
l^xnj gjj^ p px _ {pxn}^ the second order of the coding length is and the central limit theorem guarantees that 
^(log .^J^ — nI(P, W)) asymptotically obeys the Gaussian distribution with expectation and variance: 

FP.^ E P(.) E W.{y) (log ^ - /(P, W)) ' . 

X y 

Therefore, using the distribution function F for the standard Gaussian distribution, we can express the above quantities as 
follows: 



/(e,/(P,W^)|px,W^X) = V^G-i(e). (30) 

In the case of additive channels, we focus on the limiting behavior of the entropy rate of the distributions Q = 
describing the additive noise. Similar to the above, we define the following. 



Hp{R\Q) =' liminf V | — log Q"(a;) < r] 



x^X, 

H{e\Q) = svLY>{R\Hp{R\Q) < e} 



= inf{P|iJp(P|Q) > e} 
Hp{R2,Ri\Q) =^liminf V Q" | logQ"(a;) - nPi) < R2] 

H{e,Ri\Q) = snp{R2\Hp{R2,Ri\Q) < e} 
= mf{R2\Hp{R2,Ri\Q) > e} 
for < e < 1. As is discussed in Section VII in [6], when Q is given by a Markovian process Q{y\x), the relationships 

H{e\Q) = H{Q) (31) 
H{e,H{Q)\Q)^./V{Q)G~\e) (32) 
Hp{R2,H{Q)\Q) = G{R2/^V{ff)) (33) 

hold with (3 = 1/2. 

B. Stochastic limits 

In order to treat the relationship between the above quantities, we consider the limit superior in probability p- lim sup„^;^ 
and the hmit inferior in probabiUty p- liminf^^oo. which are defined by 

p- limsup Z„|p^ =^ inf{a| lim Pn{Zn > a} = 0} 

n — *oo ^ *^ 

p- lim inf Z„ I =' sup{a| lim Pn{Zn < a] = 0}. 

n— ^00 n— ^00 
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In particular, when p-limsup„^^ Zn\Pr, = p- lini inf „^oo Zn\p„ — a, we write 

p- lim 2^n|p„ = 1- 

The concept p-liminf„_>oo can be generalized as 

e-p-liminf Zn\p„ sup{a| linisupP„{Z„ < a} < e}. 

From the definitions, we can check the following properties: 

e-p- lim inf Z„ + y„ I > e-p- lim inf Zn\p^ + p- lim inf | . 
e-p- lim inf + y„ I p^ < e-p- lim inf Zn\p„ + p- lim sup y„ | p^ 

n — ^oo n — ^oo ,^ 

As shown by Han [5], the relation 

> 



1 P^jx) 
p- lim ml — log T— — 

n^oo n" P"'(a;) 

holds for a > and any two sequences P = {P"} and P' ~ {P"'} of distributions with the variable x. 

By using this concept, /(e|P, W), J{e\P, Q, W), /(e, i?i|P, W), and J(e, Ri\P, Q, VF) are characterized by 

/(e|P,VF) = e-p-liminf-log ^^'^ 



n^oc n iyp„ (y) 
J(e|P, Q, W) = e-p- lim inf - log ^ ^^'^ 



/(e,i?i|P,W) - e-p-liminf^(log— 

n^oc. Wp„ (y) 



1 



J{e,Ri\P,Q,W) = e-p-liminf-^ (log 



nRi] 



Substituting Wp„ and Q" into P" and P"' in ( l36b . and using ( l34b . we obtain 

/(e|P,W)< J(e|P,Q,W^) 
/(e,Pi|P,T^) < J(e,Pi|P,Q,W). 

Since 1 - Hp{R\Q) = liminf„^oo log(3"(a;) < -P}, H{e\Q) is characterized as 

- P(e|Q) = - inf{P|Pp(P|Q) > e} 



:sup{-P|l - Hp{R\Q) < 1 - e} = (1 - e)-p-liminf -logQ"(x) |q-. 



Similarly, 



-P(e,Pi|Q) = (l-e)-p-liminf ^(log Q"(a;) + nPi) 



Q" 



In the following, we discuss the relationship between the above-mentioned quantities and channel capacities. 

VIII. General asymptotic formulas 

A. General case 

Next, we consider the e capacity and its related quantity, which are defined by 



Cp(P|Ty)=^ inf himSUpPe,VF"($r; 
C{e\W)=^ sup I lim inf -log I I 



lim inf - log |<i>„| > R 

n — >oo n 



limsupPe_VF"('i'n) < e 



Concerning these quantities, the following general asymptotic formulas hold. 
Theorem 6: (Verdu & Han[14], Hayashi & Nagaoka [15]) The relations 

CpiR\W) = inf lim/p(P - 7|P, VF) = inf sup lim Jp(P - 7|P, Q, W) 

P 7iO P Q 7iO 

C(e|W) =sup/(e|P,W) =supinf J(e|P,Q,Vr) 
p p Q 



(34) 
(35) 

(36) 



(37) 



(38) 



(39) 
(40) 
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hold for < e < 1. 

Remark 2: Historically, Verdu & Han [14] proved the first equation in ( |40] |. Hayashi & Nagaoka [15] established the second 
equation in ( |40l i with e = for the first time, even for the classical case, although their main topic was the quantum case. The 
relation ( [39] l is proven for the first time in this paper 

Next, we proceed to the second-order coding rate. As a generalization of (|2]i and (O, we define the following: 



dct 



Cp(i?2,i?i|VF) = inf nimsupPe,W"($„) 



dcf 



1 



liminf ^(log |$„| - nRi) > R2 



limsupFe,vK"(^n) < e ^ • 



C(e,i?i|T^) sup niminf-.(log|$„| -ni?i) 

Similar to Theorem |6] the following general formulas for the second-order coding rate hold. 
Theorem 7: The relations 

Cp(i?2, Ri\W)= inf lim/p(i?2 --i,Ri\P,W)= inf suplim Jp(i?2 - 7, Q, VF) 

P 7iO P Q 7iO 

C{e,Ri\W) = sup/(e,i?i|P, VF) = supinf J(e, i?i |P, Q, VF) 
p p Q 



(41) 
(42) 

(43) 
(44) 



hold for < e < 1. 

Indeed, Theorem |7] has greater significance than generalization. This theorem provides a unified viewpoint concerning the 
second order asymptotic rate in channel coding and the following merits. First, it shortens the proof of Theorem [3] Second it 
enables us to extend Theorem [3] to the case of cost constraint. Third, it yields the extension to Gaussian noise case, which has 
continuous input signals. Fourth, it allows us to extend the same treatment to the Markovian case with the additive noise. 



B. Cost constraint 

We focus on a sequence of cost function c = {Cn}'^=i where c„ is a function from A'„ to M. In this case, all alphabets are 
assumed to belong to the set 



dcf 



i=l 



That is, our code {$„} is assumed to satisfy that supp($„) C Xn^c.K- Then, the capacities with cost constraint are given by 



dot 



Cp{R\W,c,K)= inf nimsupPeW"(«'n) 



liminf i log I I > i?,supp($„) C X^.c.k 

n — >oc Ti 



dcf 



C{e\W,c,K) ^ sup <^ lim inf - log I $, 



1 



limSUpPe^H'"(^'ri) < e,SUpp($„) C Xn^cK 



dcf 



CpiR2,Ri\W,c,K)= inf ilimsup Pe,wA^n) 



liminf — T(log|$n| - nRi) > i?2,supp($„) C Xn,c.K 



C{e,Ri\W,c,K)'^^^ sup <! liminf (log |$„| - ni?i) 



1 



limsupPe,W"(*n) < e,supp($„) C Xn^cK 



Concerning these quantities, the following general asymptotic formulas hold. 
Theorem 8: (Han[5], Hayashi & Nagaoka [15]) The relations 

Cp{R\W,c,K)^ inf lim/p(P-7|P,Ty) = infsuplimJp(P-7|P,g,VF) 

P:supp(P„)CAr„,e,K 7iO P Q 7iO 

C(e\W,c,K)^ sup I(e\P,W)= sup inf J(e|P, Q, W") 

P:supp{P„)CX„,a,K -P:supp(P„)CA'„.c,j< ^ 



(45) 
(46) 

(47) 
(48) 



hold for < e < 1. 

Remark 3: Historically, Han [5] proved the first equation in ( l48T l. Hayashi & Nagaoka [15] established the second equation 
in (l48l l with e = for the first time, even for the classical case, although their main topic was the quantum case. The relation 
(I47I 1 is proven for the first time in this paper 

Similar to Theorem |2l the following general formulas for the second-order coding rate hold. 
Theorem 9: The relations 

C„(P2, Pil W,c, AO = inf lim/p(P2 -7,Pi|P,W^) = inf sup lim J„(P2 - 7, Pi|P, Q, VF) 

P:supp(P„)CA'„,,.^ 7iO P:supp(P„)CA'„,,,K Q 7iO 

(49) 

C{e,Ri\W,c,K)^ sup I{e, Ri\P,W) ^ sup inf J(e, Pi|P, Q, T^) (50) 

P:supp(P„)CA"„,e.jf P:supp(P„)CA'„,c,j< Q 
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hold for < e < 1. 

The above theorems can be regarded as special cases of Theorems |6] and [T] by substituting the set Xn.c.K into the set X, 
Hence, it is sufficient to show Theorems |6] and |7] 



C. Additive case 

Next, we consider the case where the channel is given as a sequence of additive channel 'W{Q) = 

— x)} on the set X" with the cardinality d. Verdu & Han proved the following theorem. 
Theorem 10: (Verdu & Han [14]) The relations 

CpiR\W{Q)) = 1 - limi7p(log d-R + -f\Q) 

7iO 



C{e\WiQ))^\ogd-Hil-e\Q) 



(51) 
(52) 



hold for < e < 1. 

This theorem and ( fSST l imply ( |54] |. 

Remark 4: Verdu & Han proved ( |52] l in the case of e = at (7.2) in [14]. Other cases are proven at the first time in this 
paper. 

Similar to Theorem [TO] the following formulas for the second-order coding rate hold for general additive channels. 
Theorem 11: The relations 



Cp{R2,Ri\W) = 1 -lim Hp{-R2+ J, log d- Ri\Q) 
C{e,Ri\W) = -H{1 -e, log d-Ri\Q) 

hold for < e < 1. 

Hence, we obtain Theorem |4] from (|32] | and ( |33] |. 

Now, using Theorems |6] and I2I we prove Theorems [Tol and [TTl Since W.^{y) = Q"{y — x), we have 

1 W"-(v) 

I{e\P,W) = e-p-liminf-log— ^^^^ 



(53) 
(54) 



< e-p- lim inf - log (y) 

n — >oo TL 



< e-p- lim inf - log Q" (x) 

n — >co Ji 

= logd-H{l-e\Q), 



-f logc? 



p- lim sup — log Wpn (y) 



(55) 



(56) 



where d55] ) and d56] l follow from ( [35] ) and dJTl i. respectively. Since the equality holds when P" is the uniform distribution, we 
obtain 

sup I{e\P,W) =\ogd- H{1 - e\Q), 
p 

which implies (|52] |. Similarly, we can show (|54] |. 

Since p-limsup„^o„ ^ \ogW^^{y)\w^-^ < d, we have 



limsup ^ P"{x)W- (llog^i^ < r] 
>limsup P"'{x)W^ \- log W^{y) + log d < r\ 



: lim sup Q" I - log Q"(a;) < i? - log d 



1 - lim inf Q" <^ — log g" {x) < log d - i? 



which implies that 

Ip{R\P,W) >l-Hp{logd~R\Q). 

Thus, we obtain ( jSTT i. Similarly, we obtain (i53T l. 

Remark 5: When the sets Xn and 3^„ are given as general probability spaces with general cr-fields (T{Xn) and cr(3^„), the 
above formulation can be extended with the following definition. The n-th channel W" is given by the real-valued function 
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from Xn and o'(3^„) satisfying the following conditions; (i) For any x G Xn, is a probability measure on 3^„, (ii) For 
any F G cr(3^„), W"{F) is a measurable function on Xn- P and Q take values in sequence of probability measures on Xn 
and 3^„, respectively. Then, the summands J^xex P^^i^) J2yey ^xiu) ^re replaced by P'^{dx) and W"{dy), 
respectively. For any distribution Q on 3^„, the function ^(If-j^ is replaced by the inverse of Radon-Nikodym derivative -^^{y) 
of Q with respect to WJ^. In the above definitions, infp, supp, iiifg, and supg are given as the infimum and supremum 
among all sequences of probability measures on {Xn}^-^i and {3^n}$^Li- The following proof is also valid in this extension. 

IX. Proof of the general formulas for the second-order coding rate 

In this section, we prove Theorems |6] and |2l That is, for the reader's convenience, we present a proof for the first-order 
coding rate, as well as that for the second-order coding rate. 



A. Direct Part 

We prove the direct part, i.e., the inequalities 

Cp{R\W) < inf lim/p(i? - j\P, W) (57) 

P 7iO 

C{e\W) > sup/(e|P, W) (58) 
p 

Cp{R2,Ri\W)<MlimIp{R2~j,Ri\P,W) (59) 

P 7iO 

C{e,Ri\W)>supI{e,Ri\P,W). (60) 
p 

For arbitrary R, using the random coding method, we show that there exists a sequence of codes {$„} such that — log |$„| — > i? 
and Mmsupn^^ Pe.w^i^n) < Ip{R\P,W). This method is essentially the same as Verdu & Han's method [14]. 

First, we set the size of ^n.z.B. to be Nn = ^^R-n'^^ with the random variable Z. We generate the encoder (pz, in which 
X G A"" is chosen as (j)z{'i) with the probability P{x). Here, the choice of (pzii) is independent of the choice of other 4>z{j)- 
The decoder {I?i,z}^i is chosen by the following inductive method: 

Thus, the average error probability is evaluated as 




The second term is evaluated as 



1 iv„(7v„ - 1) ^ r 1 w:}{y) 



>R 



y2p{x)w^l-\og- 



"2 

a; 

^3/2 



>R\ 



2 - 2 

Since liminf„_ooEx^"(a;)W^x"{^logT5^Sy = IpiR\P^W), m implies that liminf„ i^zPe.w- (^-n.z) < 

Ip{R\P,W). Thus, the convergence ilog|7V„| ^ i? implies the inequality Cp{R\W) < infp /p(i?|P, W). 

Next, in order to prove ( |57] i, for any sequence P, we construct a code <!>„ such that limsup^^g^ Pe,W" i^n) < l™7io Ip(Ro^ 
j\P,W). For any fc, we choose the integer Nk such that EzPe,W'^i^n.z,Ro-i/k) < ^p(-Ro — l/k\P,W) + 1/k for 
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Vn > Nk- Then, for any n, we choose k{n) to be the maximum k satisfying n > Nk- Then, k{n) ^ cx3 as n oo. 
Thus, Ez'^„.z,Ro-i/k{n) goes to lim^|o^p(^o - -f\P,W), and ^\og\'^„,z,Ro-i/k{n)\ goes to Rq. Hence, we obtain the 
inequality Cp{R\W) < M p \im^ Ip{Ro - j\P,W), i.e., 

For proving ( |59] l. we choose A^„ = e"^i+"''-'^2-n'''^ Substituting +n'^i?2 into ni? in the above discussion, we denote 
the code ^n,ZM by ^n,z,Ri,B.2- Then, 



EzPe,wA'^n,z,R,,R.) < P'\x)W: ^ ^ \^\og - uR, j < R, + 

Since iv^g-(n_Ri+n''i?2) < ell^ ^ Q and ^ log ^ ^ i?2, we obtain the inequality Cp(i?2, i?i < infp /p(i?2, i?i|P, W). 

For any k, we choose the integer Nk such that iJ^Pg^vK" {^n,z.Ri,R'i-i/k) < -^j9(P2 — -Ril-P, W) + 1/fc for Vn > A^fe. 
Then, defining k{n) similarly, we obtain Ez'S>n,z,R^,R,-i/k{n) ^ lim^io Ip{R2-l, Ri\P, W), and log ''^"'''■";„k^,''^'""'' ^ 



R2. Hence, we obtain the inequality Cp{R2,Ri\W) < inff> lim^j^o /p(i?2 — "f, Ri\P,W), i.e.. 

For an arbitrary number R < supp I{e\P, W), there exists a sequence of input distributions P such that Ip{R\P, W) < e. 
Therefore, the inequality dSST l holds. Similarly, we can show the inequality ( l60b . 

B. Converse part 

Next, we prove the converse part, i.e., 

Cp{R\W) > infsuplim Jp(i?-7|P,Q,pr) (61) 

■P Q 7iO 

C{e\W) < sup inf J{e\P, Q, W) (62) 
p Q 

Cp{R2,Ri\W)>inisn^\iu,Jp{R2-i,Ri\P,Q,W) (63) 

P Q 7iO 

C{e,Ri\W) < supinf J(e,i?i|P,Q, VF), (64) 

p Q 

which complete our proof, because the other inequalities 

inflim/„(i?-7|P,T^) < inf suplim J„(i? - 7|P, Q, W) 

P 7iO P Q 7iO 

sup I{e\P,W) > supinf J(e|P,Q,Ty) 
p p Q 

inflim/p(i?2 -7,i?i|P,PF) < infsuplim Jp(i?2 -7,i?i|P,Q,VF) 

p 7io p Q -fia 

supI{e,Ri\P,W) > supinf J(e,i?i|P,Q, VF) 
p p Q 

are trivial based on their definitions. In the converse part, we essentially employ Hayashi-Nagaoka's[15] method. We choose 
an arbitrary sequence of codes {^n}'i^=i- Let R be liminf„^oo -log|$ri|- Assume that the code consists of the triplet 
{Nn, (f), {I'lltTi)- Then, for any sequence of output distributions Q = {Q"}^i and any real 7 > 0, the inequality 

„ r 1 M/"fw) 1 pn{R—i) 

holds, where P^^ is the empirical distribution for the |$„| points (0(1), . . . , (/'(-/V„)). 

Since > 0, the relation liminf„^oo Pe.w^^i^n) > Jp{R^l\P' , Q, M^) holds for any Q, where P' — {P$„}. Thus, 

liminf„^oo Pe,W"(<i>n) > supg lim^^o Jp{R--^\P' , Q, W). Therefore, liminf„_oo Pe,W"(*ri) > infp supg lim^|o -'p(-R- 
7|P', Q, VF), which implies (|6lTl. 

Now, assume that limsup„^o2 PcW" (*&«) = £■ Since ^ 0, (l65T l implies that i? — 7 < J(e|P, Q, W). Thus, 
i? — 7 < supp iiifg J(e|P, Q, PF), which implies Since 7 is an arbitrary positive real number, R < supp infq J(e|P, Q, W), 
which implies ( |62l ). 

Next, consider the case in which liminf„^oo log ^nSj = -^2- Replacing i? — 7 by _Ri + n'^^^(i?2 — 7) in ( |65l l, we 
obtain e"-"i+^'^2 t) ^ xhus, liminf„^oo Pe.W" i^n) > infp supg lim^|o Jp{R2 — 7, ^il-P, Q, W), which implies ( |63] l. 
replacing i?i + R2n^^^ into i? — 7 in (|65] |. similar to (|62| |. we can show 
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The inequality ( |65T l is shown as follows. We focus on the inequalities: 

M/;(,)(i?.)-e"«'g"(p.) 

<W;^^{{W;i^{y) - e"^'Q"(2/) > 0}) - e"^'Q"({M^;(,)(y) - e"«'Q"(2/) > 0}) 

<w^;w({w^;w(y) - e"^'g"(y) > o}) 

where the first inequality follows from the fact that any two distributions P and Q and any positive constant a satisfy 

niaxx,[F(X') - aQ{V)] = P{P{uj) - aQ{Lu) > 0} - aQ{P{uj) - aQ(cj) > 0}. 
Thus, 

l-Pe,W"(<i>„) = TrE^0W(^^) 



z— 1 



1 ^" , f1 

" 1=1 I 



which implies (1651 ). 



X. Proof of the stationary memoryless case 

A. Proof of Theorem |2] 

In this subsection, using Theorem]?] we prove Theorem |2] when the cardinality \X\ is finite. For this purpose, we show the 
following relations in the stationary discrete memoryless case, i.e., the case in which WJ^{y) = W^f "(y) = n"=i ^xiiUi) 
for a; = {xi, . . . ,Xn) and y = (j/i, . . . , In this section, abbreviating Cjy^ as C, we will prove that 

f G{R2/\/v+) i?2 > 
inflim/p(i?2 -7,C|P, W) < <^ ' V " (66) 

" [ G{R,/^V^) R, < 0. 

and 

f G{R2/\/v+) i?2 > 
infsuplimJp(i?2 -7,C|P,Q,T^) > <^ ' V " (67) 

^ Q [ G(R2/^V^) i?2 < 0. 

Showing both inequalities and using Theorem [T] we obtain 

■ G{R2/\lv^) i?2 > 
C,{R2,Ri\W)={ ^'M^ 2- (68) 

G{R2NVw) R2<0. 



C{e,Ri\W) 



Since the rhs of ( l68l l is continuous with respect to e, (l68l l implies that 

/l^G-i(e) e>l/2 
^G~\e) e<l/2. 

That is, we can show Theorem |2] 

In fact, when P is the i.i.d. of Pm+ or Pm-, /(e, C\P, W) is equal to y^i^-i(e) or y^F-i(e). Thus, (jSgll holds. 
Therefore, the achievability part (the direct part) of Theorem |2] hold. Therefore, it is sufficient to prove the converse part ( |67] ). 

We focus on the set T„ of empirical distributions with n outcomes. Its cardinality |T„| is evaluated as |T„| < (n + l)''^'. 
In this proof, we use the distribution 
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and the sets 



V, = {P|/(P, W)>C + e} 



where ep(a;) is the empirical distribution of x G X". 

Since Ql(y) > iE^^T(W,p,,,)>'»(!/) and CJfo) > ^<3i?(j/), 



= E ^"(-)P^.^" ^ 77^ ( log ^(f - ) < 



I ^ Q&(y) 



When X € V5, 



1 / Ty"(y) , 



E^^x.^ (log + l°g(l^"l + 1) - = ^ (n/(ep(x), W^) + log(|r„| + 1) - nC) 



^ log(|T„| + 1) 



Thus, Chebyshev inequality implies 



Define the quantity Vp^-s^ *== EpE^^^ (log - -^(WxIIQm))^. When x e Ve, since the random variable log (Q^)xlf(''^) = 
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E" 1 log /^T'^'s^'f'l has the variance nV' , s 



I , ^ I > min G ' ^ 



V ] P'^^' \ V' 



Since the random variable log ,^''\J'y} ^ = , log ,^°''ff'\ is written as a combination of finite number of random variables 

*= {Qm)^"(v) ^i=\ => {QM)(.Vi) 

{QM)(y)- 



{log 7^^i7^}x6Ar, the above convergence is uniform. That is, for any 5 > 0, there exists > such that for n> N, 



^ ^ 0°^ (^S|y + log(|T„| + 1) - .G ) < i? 
> min G I ,^ I - 5. 



Therefore, 



> min G I , ^ I - 6, 



where fi^ is the complement of i7„. 
Thus, 



1 / VK^"fw) 
limsupPp„ w-^<^[ log " / - nG I < i? 



> min G ( , ^ I - 6. 

Since S > and e > are arbitrary, when Q = {Qij}, 

Jp{R,C\P,Q,W) 



= limsupPp„,^.„|-^^log^ 



G < i? 



> min G 
Pev 



\ ^ I G(i?/^\^+) i? > 



which implies ( l67b because of the continuity of the rh.s. 
B. Proof of Theorem \3\ 

In this subsection, using Theorem|9l we prove Theorem [3] when the cardinality \X\ is finite. For this purpose, we show the 
following relations in the stationary discrete memoryless case, i.e., the case in which W'^{y) — W^^^{y) TYi=i ^xiiui) for 
X = {xi, . . . , Xn) and y — {yi, . . . , y^), and c„(x) — X^ILi ^(a^i)- I" this section, abbreviating Gj^^ as G, we will prove that 



inf lim/,(i?2-7,i?i|i',W-)<<; "-^'^^^ (69) 



P:supp(P„)CA'„.,,jf 7iO 
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and 



inf suplim Jp(i?2 -7,i?i|P,Q,W) > <! ^. ' ' (70) 



f :supp(P„)C^„..,.. Q 7iO G(i?2/^V^,V,,,K) < 0. 

Showing both inequahties and using Theorem |9] we obtain 



Cp(i?2,i?i|vr,c,fr) = <^ — L_ (71) 

G(i?2/./VW,.,K) ^2<0. 



Since the rhs of ( |7T] i is continuous with respect to e, ( |7T] i impHes that 

C(e,i?i|W,c,X) = 



That is, we can show Theorem |3] 

The inequahty (|70] i can be proven in the same way as ( |67] i by replacing T„ and Qm by the set of empirical distributions 
Tn.c.K '= {P G Tn\Ejpc(x) < K}. and Qai^c,k- Therefore, the converse part of Theorem [3] hold. Therefore, it is sufficient to 
prove the direct part ( |69] l. 

For any distribution P satisfying Fjpc{x) < K, we choose the closet empirical distribution P„ e Tn^c,K- Let P = {P"} be 
the uniform distributions on the set Tp^ {x A'"|ep(x) = Pn}- It is sufficient to show that 



IpiR, C\P, W) < G{R/y/V^). (72) 
P"ix) < |r„|(P„)X"(a;), (73) 



Ip{R,C\P,W) 

= limsupPp„,v^xJ^ log-^-f(-nC) <R 



Since 



we have 



which implies (|72] ). 

In order to prove ( |72] | without condition jA"! < 00, we choose a sequence of input distributions {P^^}'^^ with finite 
supports such that 

I(P^\W)^ max /(P,m 
Choose the distribution P" as the uniform distributions on the set T 1 . Then, in stead of ( |73] ). the relation 

P("5) 

P"(x) < (n+ l)"*(p("*))^"(x) 
holds. Since log(n + 1)^^^ goes to zero, the same discussion as ( |74l t yields ( l72b . 
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C. Proof of Theorem |5] 
As is shown in Subsection IX-BI we obtain the direct part, i.e., 

C,^(a,C^,s|7V,5) < G(a/vA^). 

Hence, when c„(a;) = 'Y^^=i sufficient to prove 



inf snplimJp{R2--/,Ri\P,Q,W)>G{a/^/V^^). (75) 

P:supp(P„)CAr„,e,s Q llO 



In the following discussion, we use the distribution 



and the sets 



We obtain 



Ve = {P|Epa;2 < 5 - e} 
0„ =^ {x e A-" |ep(a;) G V, } . 



- nCj < R 

log 2 - nC j < i? 
log2-nC) < R 



When X e V^, the random variable ^log ^^y^-° )^^(y) ^*^S2 — ^^C^ has the expectation 

^ (^f log(l + ^)+ I^Xg - f log(l + I) + log2j (< 12^ _ log ^), and the variance (< 
g-e Ti— )■ Thus, Chebyshev inequality implies 

2(1+^)2 



p I V" 1„„ l+~ log 2 



When a; e Vp, under the n-variable Gaussian distribution W^"^, the random variable log > is calculated to be 

S\\yf 2x-y \\x\\^\ n S , 



2(i + |.)V iV2 ' TV ' N J 2^°^''^^ N^' 



The expectation is " ^.n" + f log(l + jr), and the variance is — " . The random variable 



24 



2(1 + ^) 



^ log(l + jf) I converges the normal distribution when n goes to infinity. Due to 



the property of Gaussian distribution, this convergence is uniform when ||a;|| is bounded. Hence, 



Therefore, 



^ log 




log 2 - ?iC ) <R 
log 2 



_ S_ 

N "'AT 



2(1 



N 



R<0 



R>0. 



limsupPp^ ^x,i <^ —= log 



> < 




C] <R 



R>0 



Since e > is arbitrary, when Q — {Q^}, 

Jp{R,C\P,Q,W 

= lim sup Pp7^^/xn 



log 



>G 



R 



Quiv) 



nC]<R 



I S2 



4-4 — 



which implies (TTST i. 



XI. Concluding remarks and future study 

We have obtained a general asymptotic formula for channel coding in the sense of the second-order coding rate. That is, it has 
been shown that the optimum second-order transmission rate with the error probability e is characterized by the second-order 
asymptotic behavior of the logarithmic likelihood ratio between the conditional output distribution and the non-conditional 
output distribution. Using this result, we have derived this type of optimal transmission rate for the discrete memoryless case, 
the discrete memoryless case with a cost constraint, the additive Markovian case, and the Gaussian channel case with an energy 
constraint. The performance in the second-order coding rate is characterized by the average of the variance of the logarithmic 
likelihood ratio with the single letterized expression. When the input distribution producing the capacity is not unique, it is 
characterized by its minimum and its maximum. We give a typical example such that the minimum is different from the 
maximum. Furthermore, both quantities have been verified to satisfy the additivity. 

The main results of the present study are as follows. While the application of the information spectrum method to the 
second-order coding rate was initiated by Hayashi [6], his research indicated that there is no difficulty in extending general 
formulas to the second-order coding rate. Therefore, in the i.i.d. case, the second-order coding rate of the source coding and 
intrinsic randomness are solved by the central limit theorem. However, channel coding cannot been treated using the method 
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of Hayashi[6] except for the additive noise case with no cost constraint because the present problem contains the optimization 
concerning the input distribution in the non-additive noise case. In the converse part, we have to treat the general sequence of 
input distributions. In order to resolve this difficulty, we have treated the logarithmic likelihood ratio between the conditional 
output distribution and the distribution QJ), which is introduced in Subsection IX- Al 

Furthermore, we can consider the quantum extension of our results. There is considerable difficulty concerning non- 
commutativity in this direction. In addition, the third-order coding rate is expected but appears difficult. The second order 
is the order ^/n, and it is not clear whether the third order is a constant order or the order log n. This is an interesting problem 
for future study. 
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Appendix 



For a given i? < 0, we prove ( fTSl l. Since ^jp^(s) > 0, the function i/'p is convex. Choosing s„ such that + ^ 
-^(^n) = -^(0) - /o'" ^(Orft we have the relation 



^ (t)dt. (76) 



Then, the minimum of Cj^'^s-|-^s+-!/'p(s) is attained when s = s„. Since ^-^^{s) is continuous and bounded, s„ approaches 
zero as n goes to infinity. More precisely, ( f76] l implies i?2 = — limn^oo \/n Jq" ^^-j^{t)dt = — lim„^oo(v^'Sn) (0)- 

-(0)' 



That is, lim„^oo(\Ai*n) — d^Jl^ , ■ When the function e(?i) is chosen to be ^-^^(u) — ^J' (0), t{u) approaches zero as u 



goes to zero. 
Thus, we have 



n mm 

0<s<l 



2 ds^ ' ' Jo Jo ' ' 2^(0)^ 
which implies (fTSl ). 
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